AITopics | cross-lingual text classification

Collaborating Authors

cross-lingual text classification

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Universal Cross-Lingual Text Classification

Savant, Riya, Shelke, Anushka, Todmal, Sakshi, Kanphade, Sanskruti, Joshi, Ananya, Joshi, Raviraj

arXiv.org Artificial IntelligenceJun-16-2024

Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existing labels/datasets in different languages. This research proposes a novel perspective on Universal Cross-Lingual Text Classification, leveraging a unified model across languages. Our approach involves blending supervised data from different languages during training to create a universal model. The supervised data for a target classification task might come from different languages covering different labels. The primary goal is to enhance label and language coverage, aiming for a label set that represents a union of labels from various languages. We propose the usage of a strong multilingual SBERT as our base model, making our novel training strategy feasible. This strategy contributes to the adaptability and effectiveness of the model in cross-lingual language transfer scenarios, where it can categorize text in languages not encountered during training. Thus, the paper delves into the intricacies of cross-lingual text classification, with a particular focus on its application for low-resource languages, exploring methodologies and implications for the development of a robust and adaptable universal cross-lingual model.

classification, cross-lingual text classification, text classification, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/I2CT61223.2024.10543381

2406.11028

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > Czechia > Prague (0.04)
Asia > India (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)

Add feedback

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Nishikawa, Sosuke, Yamada, Ikuya, Tsuruoka, Yoshimasa, Echizen, Isao

arXiv.org Artificial IntelligenceOct-11-2022

Inspired learning, models are trained on annotated data in a by previous work (Yamada and Shindo, 2019; Peters resource-rich language (the source language) and et al., 2019), we compute the weights using then applied to another language (the target language) an attention mechanism that selects the entities relevant without any training. Substantial progress to the given document. We then compute in cross-lingual transfer learning has been made the sum of the entity-based document representation using multilingual pre-trained language models and the text-based document representation (PLMs), such as multilingual BERT (M-BERT), computed using the PLM and feed it into a linear jointly trained on massive corpora in multiple languages classifier. Since the entity vocabulary and entity (Devlin et al., 2019; Conneau and Lample, embedding are shared across languages, a model 2019; Conneau et al., 2020a). However, recent empirical trained on entity features in the source language can studies have found that cross-lingual transfer be directly transferred to multiple target languages.

classification, information retrieval, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2110.07792

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Europe > Germany (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.68)

Add feedback